This section is basically just remaking the other graphs in R, a bit simpler, and animating them. The data comes from all.csv. There are 218 federations recognized by FIDE in the .csv file, so the first step after reading in the data…
all <- read.csv(file = "all.csv")
…will be selecting just a few countries to focus on:
focus <- subset(all, fed == "USA" | fed == "IND" | fed == "CHN" | fed == "PUR"
| fed == "RUS" | fed == "NOR" | fed == "FRA")
A few libraries are necessary for this project:
library(ggplot2)
library(gganimate)
library(RColorBrewer)
library(gifski)
A good first step is to make a very simple animation just to get familiar with the process.
# for each federation, we plot the best-rated player's rating
p <- ggplot(focus, aes(x = focus$fed, y = focus$best_rating)) + geom_bar(stat="identity") +
transition_time(focus$year)
image <- animate(p, renderer = gifski_renderer())
print(image)
yields:
Neat! But not very impressive. And, the rendering will only take longer from here. Still, my idea was to recreate something like the visualizations found here: https://www.youtube.com/watch?v=z2DHpW79w0 - a few top players represented by points moving through time. We’ll start by changing the grey bars to colored points:
p <- ggplot(focus, aes(x = focus$year, y = focus$best_rating, group = focus$year,
color = factor(focus$fed))) +
geom_point() + scale_fill_brewer(palette = "Paired")
… and cleaning up the y-axis a little bit:
p <- p + scale_y_continuous(breaks = scales::pretty_breaks(n = 10))
I tried to keep the past frames visible as the points moved using shadow_wake(), but it did not quite work.
p_anim <- p + transition_time(focus$year) +
shadow_wake(wake_length = 0.5, alpha = FALSE)
image <- animate(p_anim, renderer = gifski_renderer())
print(image)
The transition is still not very smooth, the dots are very small, and they don’t stick around well. Maybe this had to do with the wake_length = 0.5 setting, but I moved one from this method before really trying to work it out. Next to try was combining line and point geometries.
p <- ggplot(focus, aes(x = focus$year, y = focus$best_rating, group = focus$fed,
color = factor(focusy$fed))) +
geom_line(size=1) + scale_fill_brewer(palette = "Paired") +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
geom_point(size=1.5)
Rendering this plot with
p_anim <- p + transition_reveal(focus$year) + view_follow(fixed_y = TRUE)
image <- animate(p_anim, renderer = gifski_renderer())
print(image)
gave something a lot closer to what I had in mind. view_follow() adapts the plot bounds to match the data being shown:
This might be a good time to bring up transition_time and transition_reveal. The impression I get is that transition_time shows different states of the plot at different times, while transition_reveal exposes more of a single plot over time. However it works though, this animation is starting to take shape. Let’s use another package.
install.packages(ggrepel)
library(ggrepel)
This is for making sure labels don’t overlap. I thought the graph would look better with the top players’ names next to their lines, and for this final draft I fixed the axis labels as well.
# Setting up the plot
p <- p + labs(x = "Year", y = "Top Player Rating",
title = "Selected Countries' Top Player",
colour = "Country")
# Preparing the animation
p_anim <- p + geom_label_repel(aes(label = focus$best_name)) +
transition_reveal(focus$year) + view_follow(fixed_y = TRUE)
# Animating
image <- animate(p_anim, duration = 46, fps = 12, renderer = gifski_renderer())
print(image)
I increased this animation’s duration so the player names might stay readable a little longer, and it took a few minutes to render.
The names get a little jumpy towards the end of the animation as the top selected countries get very close and switch ranking often. The labels do even cover up the lines at the end, so the static plot becomes useful as well.
Here we’ll look at plots showing the growth of chess over the world. Or, more accurately, the growth of FIDE as an organization. This will use the same data set that’s already been read in: all. Something that the previous plots did not touch on is the total number of chess players registered under FIDE. So, this graph should ideally show the total while still emphasizing a few selected countries.
p <- ggplot(all, aes(x = all$year, y = all$num_players, group = all$fed)) +
geom_bar(stat = "identity", aes(fill = all$fed))
Wow.
So, there’s quite a bit wrong with this. Let’s start by addressing the legend. I used scale_fill_manual() to solve this problem; the breaks argument seems to describe which items to put in the legend.
p <- p + scale_fill_manual(breaks = c("RUS", "IND", "GER", "ESP", "FRA", "IRI", "POL"),
values = c(colorRampPalette(rev(brewer.pal(9,"Spectral")))(218)))
The values argument was adapted from Javier’s use in the Chapel Hill temperature graphs. That 218 is for the number of federations in the data frame, so each will have their own color. But, now there are more issues with the plot.
This still does not emphasize the selected countries (the seven most “populous” in 2020). My idea here was to set all of the other countries to some grey, then pick new colors for these seven. At this point, we’ll also switch back to the “Paired” color palette that the other final graphs use.
# Setting every country to an arbitrary grey
cols <- rep("#AAAAAA", 218)
# the colors for "Paired" palette:
# [1] "#A6CEE3" "#1F78B4" "#B2DF8A" "#33A02C" "#FB9A99" "#E31A1C"
# [7] "#FDBF6F" "#FF7F00" "#CAB2D6" "#6A3D9A" "#FFFF99" "#B15928"
cols[levels(all$fed) == "RUS"] = "#FDBF6F"
cols[levels(all$fed) == "USR"] = "#FDBF6F"
cols[levels(all$fed) == "IND"] = "#33A02C"
cols[levels(all$fed) == "GER"] = "#B2DF8A"
cols[levels(all$fed) == "ESP"] = "#A6CEE3"
cols[levels(all$fed) == "FRA"] = "#1F78B4"
cols[levels(all$fed) == "IRI"] = "#FB9A99"
cols[levels(all$fed) == "POL"] = "#E31A1C"
Now each of the selected countries has its own color to stand out from the rest. For simplicity, I assigned “RUS” and “USR”, Russia and the USSR, the same color. I’ll interject with one additional library: library(scales) allows for changing the y-axis from scientific notation.
p <- ggplot(all, aes(x = all$year, y = all$num_players, group = all$fed)) +
geom_bar(stat = "identity", aes(fill = all$fed))
# Now using our custom color scheme and neatening the labels:
p <- p + scale_fill_manual(breaks = c("ESP", "FRA", "GER", "IND", "IRI", "POL", "RUS"),
values = cols) +
labs(x = "Year", y = "Total Chess Players",
title = "Global Growth of FIDE", fill = "Top Countries (2020)") +
scale_y_continuous(labels = comma)
That’s the final static plot for graphing the changing size of the global chess community. The animated version follows below. None of the code for this process was saved, and no plots were saved for in-between stages. I’ll work on recreating this.
Another issue we were trying to address was a relationship between age and skill, or age and rating. To get this data without having to go through any cleaning process, I used the FIDE website, http://ratings.fide.com/download.phtml, and only used data available in .xml format (2013-2020).
library(XML)
df <- xmlParse(file = "xml_ratings\\standard_jan13frl_xml.xml")
df <- xmlToDataFrame(df)
That xmlToDataFrame() took five to ten minutes for each file. These sets get extremely large, so working with all of the data is impractical. It’s a lot easier to just select the grandmasters from this large set.
# First changing the class of these columns so they can be easily ordered later
df$rating <- as.numeric(as.character(df$rating))
df$birthday <- as.numeric(as.character(df$birthday))
# Then selecting for grandmasters only
gms13 <- df[df$title == "GM",]
gms13 <- gms13[order(-gms13$rating),]
So far, these examples have used the January 2013 data set. I thought it would be easiest to go through these steps for each .xml file, and combine them only once there were the grandmaster-only data frames. There are only a few more steps to make sure the data will work well:
# Adding a new column with the year so the data can be distinguished
gms13$year <- c(as.numeric(2013))
# Adding this column to some of the data frames.
# The column "foa_title" was not always present
gms13$foa_title <- NA
Combining the data frames is simple enough with the rbind() function:
all_gms <- rbind(gms13, gms14, gms15, gms16, gms17, gms18, gms19, gms20)
I exported this data frame to a .csv at this point. Waiting for the .xml files to convert should never have to happen again. Anyways, here’s the static plot we’ll use:
p <- ggplot(gms20, aes(x = gms20$rating, y = gms20$birthday,
group = gms20$birthday, fill = ..x..))
# Set the colors right
p <- p + scale_fill_gradientn(colors = colorRampPalette(brewer.pal(9, "RdYlGn"))(100)) +
geom_density_ridges_gradient(scale = 5, rel_min_height = 0.05)
# Fix the axes and remove the legend
p <- p + scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
theme_light() + theme(legend.position = "none") +
labs(x = "Rating", y = "Birth Year", title = "Age-Rating Relationship")
# Add a liine for 2500 rating - the minimum required to get the GM title
geom_vline(xintercept = 2500, color= "darkblue") + xlim(2000, 3000) +
The first attempt at animating it actually went pretty well.
p_anim <- p + transition_states(all_gms$year, transition_length = 2)
image <- animate(p_anim, renderer = gifski_renderer)
print(image)
While transition_length = 2 slowed down the time in between years, I still don’t like how this animation pauses on each year. state_length = 0 solves that problem: 0 time spent on each state.
p_anim <- p + transition_states(all_gms$year, transition_length = 3, state_length = 0)
The animation might be more useful if the year of the frame were shown as well:
p_anim <- p_anim + labs(title = "Age-Rating Relationship: {closest_state}")
image <- animate(p_anim, renderer = gifski_renderer())
print(image)